Efficient Communication Using Message Prediction for Cluster Multiprocessors
نویسندگان
چکیده
With the increasing uniprocessor and SMP computation power available today, interprocessor communication has become an important factor that limits the performance of cluster of workstations. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems in such systems. A significant portion of the software communication overhead belongs to a number of message copying. Ideally, it is desirable to have a true zero-copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination without any intermediate buffering. However, due to the fact that message-passing applications at the send side do not know the final receive buffer addresses, early arrival messages have to be buffered at a temporary area. In this paper, we show that there is a message reception communication locality in message-passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be efficiently used to drain the network and cache the incoming messages even if the corresponding receive calls have not been posted yet. The performance of these predictors, in terms of hit ratio, on some parallel applications are quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies.
منابع مشابه
Efficient Communication Using Message Prediction for Cluster of Multiprocessors
With the increasing uniprocessor and SMP computation power available today, interprocessor communication has become an important factor that limits the performance of cluster of workstations. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems...
متن کاملA comparison of techniques used for mapping parallel algorithms to message-passing multiprocessors
This paper presents a comparison study of popular clustering and mapping heuristics which are used to map task-flow graphs to message-passing multiprocessors. To this end, we use task-graphs which are representative of important scientific algorithms running on data-sets of practical interest. The annotation which assigns weights to nodes and edges of the task-graphs is realistic It reflects cu...
متن کاملA Comparison Study of Heuristics for Mapping Parallel Algorithms to Message-passing Multiprocessors
This paper presents a comparison study of popular clustering and mapping heuristics which are used to map taskow graphs to message-passing multiprocessors. To this end, we use task-graphs which are representative of important scienti c algorithms running on data-sets of practical interest. The annotation which assigns weights to nodes and edges of the task-graphs is realistic. It re ects curren...
متن کاملScalable Inter-Cluster Communication Systems for Clustered Multiprocessors
As workstation clusters move away from uniprocessors in favor of multiprocessors to support the increasing computational needs of distributed applications, greater demands are placed on the communication interfaces that couple individual workstations. This paper investigates scalable, e cient, and reliable communication systems for multiprocessor clusters that use commodity local area networks ...
متن کاملPredicting MPI Buffer Addresses
Communication latencies have been identified as one of the performance limiting factors of message passing applications in clusters of workstations/multiprocessors. On the receiver side, message-copying operations contribute to these communication latencies. Recently, prediction of MPI messages has been proposed as part of the design of a zero message-copying mechanism. Until now, prediction wa...
متن کامل